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The objective of this study is to analyze and discuss the metrics of the 
predictive model using the K-nearest neighbor (K-NN) learning algorithm, 
which will be applied to the data on the perception of engineering students 
on the quality of the virtual administrative service, such as part of the 
methodology was analyzed the indicators of accuracy, precision, sensitivity 
and specificity, from the obtaining of the confusion matrix and the receiver 
operational characteristic (ROC) curve. The collected data were validated 
through Cronbach's Alpha, finding consistency values higher than 0.9, which 
allows to continue with the analysis. Through the predictive model through 
the Matlab R2021a software, it was concluded that the average metrics for 
all classes are optimal, presenting a precision of 92.77%, sensitivity 86.62%, 
and specificity 94.7%; with a total accuracy of 85.5%. In turn, the highest 
level of the area under the curve (AUC) is 0.98, which is why it is 
considered an optimal predictive model. Having carried out this study, it is 
possible to contribute significantly to the decision-making of the higher 
institution in relation to the improvement of the quality of the virtual 
administrative service. 
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1. INTRODUCTION 


Today it is essential that the various organizations have the tools to capture, analyze and adapt to 
changes; even more so in this environment of high competition in institutions, the culture of organizational 
learning is considered key for decision-making based on the achievement of goals, to cement permanence in 
the market and to transcend it [1], [2]. In the virtual context in which we find ourselves, educational 
institutions apply techniques to understand the modeling of their users' perception of the quality of the 
service they provide, be it academic, pedagogical or administrative [3], [4]. The modeling of user behavior is 
one of the techniques most used by organizations, with the development of personalized content according to 
the level of user-platform interaction as its main motivation [5], [6]. 

In this educational field, the objective is to facilitate learning activities for the user, access to 
information in an agile way and the management of the required resources [7]. Improving the levels of 
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personalization of the content associated with an information system generates a positive perception for the 
user, making the performance of the services provided more efficient [8], [9]. At present, the development of 
information technologies allows to store and manage large amounts of data, this applied in virtual education 
environments generates the possibility of personalizing the content presented to users, through classification 
models [10], [11]. 

The classification models are supported by the definition of a priori groups so that through a machine 
learning model and with known information inputs, the unit of study can be classified into a specific group, in 
this technique dependent variables are defined and independent [12], [13]. Automatic learning or machine 
learning, belongs to a branch of artificial intelligence that is based on algorithms that allow modifying the 
behavior of data based on experience or knowledge acquired autonomously, in order to facilitate the extraction 
of users. of relevant information [14], [15]. This technique groups together a wide range of algorithms focused 
on solving various problems, such as: selection of characteristics, classification, grouping or imputation of data, 
among others [16], [17]. Likewise, the chosen algorithm depends on the type of information analyzed, this 
allows obtaining higher quality information and improving processing times [18], [19]. 

Among the algorithms most used in automatic learning for classification models, is the K-nearest 
neighbors (K-NN), given its simplicity and efficiency to detect and classify elements in categories [20], [21]. 
This supervised learning algorithm is made up of several descriptive attributes and a single objective attribute 
(also called class) [22]. The parameter k in K-NN refers to the number of neighbors with which the belonging 
to a category is defined, this parameter is usually determined empirically, depending on the problem it is 
tested with different values of K, choosing the parameter with the best performance in precision [23], [24]. 
Given what has been described, the present study aims to analyze and discuss the metrics of the predictive 
model obtained through the supervised learning algorithm K-NN, also known as medium K-NN, applied to 
the data of the perception of the quality of the virtual administrative service by engineering students, for 
which the indicators of accuracy, precision, sensitivity and specificity will be analyzed, based on obtaining 
the confusion matrix and the receiver operational characteristic (ROC) curve, the purpose of this analysis is 
to provide information quality and relevant to university managers to improve decision-making. 


2. RESEARCH METHOD 

The level of research, according to the degree of measurement and analysis of the information is 
descriptive, it is based on analyzing and discussing the metrics of the predictive model obtained through the 
medium K-NN supervised learning algorithm, applied to the data of the perception of the quality of the 
virtual administrative service by engineering students. Thus, the methodology is based on the construction of 
a predictive model through the execution of MATLAB software. Figure 1 shows the proposed methodology 
according to the supervised learning algorithm medium K-NN. 


Data collection — 
(Instrument) a DATA => Data processing in MATLAB 


Classification Learner 


Classification using the medium K- 
NN supervised learning algorithm 
Validation Confusion Matrix 
Validation ROC Curve 


4 


Parallel Coordinates Plot 


Analysis of the predictive model 
metrics based on the medium K-NN 
supervised learning algorithm 


Figure 1. Methodology of the proposed medium K-NN algorithm 
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The data referring to the perception of the quality of the virtual administrative service were collected 
by means of the survey technique, and the collection instrument is the questionnaire, with responses on a 
Likert scale ranging from 1 to 4, which represent the levels from dissatisfied to very satisfied, these levels of 
satisfaction in the analysis will be represented as the classes of the model. The survey was carried out 
virtually due to the context of the health emergency declared by Covid-19, and was applied to all 651 
students from the seventh to the tenth cycle, belonging to professional engineering schools, this criterion is 
part of a regulation established and approved by the university. As part of the methodology, the data 
collected is validated through Cronbach's Alpha coefficient using the SPSS software; once this analysis has 
been carried out, it is observed in Table 1 that the values obtained show a high homogeneity and equivalence 
of the response of all the indicators, since "values greater than 0.9 indicate a great consistency of the 
elements of the scale" [25]. Likewise, Table 1 shows the indicators called predictors of the quality of the 
administrative service. Based on these results, the analysis can be continued. 


Table 1. Value of Alpha Cronbach’s 


Code Indicators Alpha Cronbach’s 
I1 Efficient work 0.944 
12 Timely attention 0.941 
13 Relevant information provided 0.942 
Ol Quality care 0.935 


3. RESULTS AND DISCUSSION 
3.1. Determination of the predictive model 

This is the training stage, where a classification algorithm builds a model by analyzing or learning 
from a set of training data. Thus, for the determination of the predictive model, the data collected based on 
the first 3 indicators shown was used. In Table 1, which are called predictors, while the indicator that 
represents the quality of the virtual administrative service is represented by 01. Thus, through the MATLAB 
software and through the Classification Learner tool and Statistics and Machine Learning Toolbox 12.1, the 
best type of predictive model determined by the validation of the accuracy is identified. The results generated 
by the Matlab R2021a software are shown in Table 2. 

According to Table 2, of all the learning algorithms, the best type of classification model was 
granted by the supervised learning algorithm K-NN or medium K-NN, with a validation of 85.5%. Regarding 
the validation percentage in [21] it is pointed out that, by obtaining 77.85% precision, it can be stated that the 
K-NN algorithm is capable of classifying well the unbalanced data with the most optimal values. As 
indicated in [26], the results of the classification and forecasting process show good results in terms of 
precision when using the medium K-NN algorithm, representing benefits in the ease of interpretation and 
comparability of the results, in this case with 91% accuracy exceeds 7.1% linear and quadratic classifier 
performance. Contextualizing this result in the university environment, for the educational institution it is 
important to know the different ways in which the student relates to the educational process and much better 
if it can take actions to facilitate learning and student satisfaction, re-conceptualizing the management virtual 
environments of the educational service. 


Table 2. Results of classification learner 


Algorithm Accuracy (validation) 
Medium K-NN 85.5% 
Bilayered neural network 85.4% 
Bagged trees 85.3% 


3.2. Results of the predictive model metrics 

The confusion matrix is a useful tool to analyze how well a classification model can correctly predict 
outcomes in a large number of classes [27]. As indicated in [28] the K-NN algorithm requires knowing in 
advance the value of k to determine the K closest neighbors, the confusion matrix being a common procedure 
to evaluate different K-NN configurations. Described in the previous paragraph, Figure 2 shows the confusion 
matrix according to the number of observations, this matrix contains information about the predictions made 
by the classification system and reports the number of false negatives (FNR) and true positives (TPR), which 
shows the closeness between the levels of satisfaction predicted (predicted class) by the model with respect to 
its true value (true class). Rajagopal et al. in [19] it is indicated that the TPR defines the number of positive 
samples correctly classified as positive and the FNR is the number of positive examples incorrectly classified 
as negative, these 2 configurations plus the positive values predicted (PPV) and false obtained rate (FDR), 
shown in Figure 3, will define the performance metrics of the predictive model. 


K-NN supervised learning algorithm in the predictive analysis of ... (Omar Freddy Chamorro Atalaya) 


524 o ISSN: 2502-4752 


In Figure 2, it is highlighted that of the 4 classes on which the predictive model acted, class 1 shows 
the highest percentage of sensitivity, this means that the predictive model has the ability to discriminate 
between a true positive (TP) of a false negative (FN) in this class (satisfaction level: dissatisfied), in this case 
it is 89.7%; in other words, the model was only 10.3% confused, a considerably low rate. Although all the 
levels are high, we can say that the lowest level of sensitivity of the predictive model is shown in class 2 
(level of satisfaction: not very satisfied), whose value is 82.2%. Likewise, in Figure 2, it is highlighted that to 
the right of the confusion matrix the rate of TPR and the rate of FNR are shown for each class. Regarding the 
validation of true positives and false negatives, Mhaske-Dhamdhere and Vanjale in [29], the medium 
algorithm K-NN, allowed to know that out of 160 emails from computer engineering students there are true 
positives of 67% and 80%, these values are very high, they are due according to users to the quality of the 
service in terms of email, such as technical parameters and structure. 

In Figure 3, a second confusion matrix is shown in which its main diagonal values indicate the 
precision of the predictive model for each class. Figure 3 shows that the predictive model for class 3 
(satisfaction level: satisfied) shows the best sensitivity rate, with a precision rate of 89.9%, thus being the 
highest among the other classes. While the lowest level of sensitivity of the predictive model is shown in 
class 3 (level of satisfaction: very satisfied), whose value is 80.2%. Although the predictive model for class 3 
(satisfaction level: satisfied) shows a sensitivity rate of 85.6% (Figure 2), in this case it shows a precision rate 
of 89.9%, which indicates that the level of dispersion of the data used is very low; answering that if the 
dispersion is low the precision is high. Likewise, one aspect to highlight is that in the lower part of the 
confusion matrix of Figure 3, it should be noted that the PPV and the FDR are shown for each class. 


Model 2.15 


True Class 


Predicted Class 


Figure 2. Confusion matrix based on PPV and FDR rates 
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eRe 18.0% 


ima 85.7% 814% 89.9% 80.2% 
FDR| 14.3% | 18.6% | 10.1% | 19.8% 


1 2 3 4 
Predicted Class 


True Class 


Figure 3. Confusion matrix based on PPV and FDR rates 


As noted, the four ranking possibilities of any intrusion detection study determine meaningful 
performance metrics, such as accuracy (A), precision (P), sensitivity (S), and specificity (R). Table 3 shows 
the metrics of the predictive model, for each class, which shows that the four metrics show relatively high 
values in the 4 classes. In general, the accuracy of the predictive model is 85.5%. 
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In relation to the sensitivity indicator of the predictive model using the supervised learning 
algorithm K-NN, Figure 4 shows the response that Matlab provides for each class under study with its 
corresponding ROC graph, for each class; ROC validation represents the relationship between the sensitivity 
and specificity indicators; in this case the ROC of class 1 (unsatisfied) is displayed. In Figure 4, the 
discrimination threshold is shown, which is 0.90 for the rate of true positives and 0.02 for the rate of false 
positives, showing an AUC of 0.98, being a value almost optimal, very close to one. In the same way, the 
ROC of class 4 is shown (very satisfied), in Figure 5 the discrimination threshold is shown, which is 0.89 for 
the rate of true positives and 0.03 for the rate of false positives, evidencing an AUC of 0.96, as indicated, the 
closest value is 1, the model is much more optimal. 


Table 3. Results of classification learner 
Metrics 


Class Sensitivity Sensitivity Accuracy Precision 
1 89.66% 98.07% 97.11% 85.71% 
2 82.19% 92.44% 89.49% 81.45% 
3 85.63% 91.63% 88.83% 89.94% 
4 89.00% 96.67% 95.66% 80.18% 
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Figure 4. Validation ROC Curve for class 1 Figure 5. Validation ROC Curve for class 4 


Regarding ROC validation, Susheelamma and Ravikuma in [22] it is indicated that there is an 
improvement in ROC performance, thus reflecting that the proposed learning model improves the score of 
measure F by 24.45%, 26.65%, and 18. 96% on the existing learning model; with this validation an average 
improvement of 23.35% of the existing model is achieved. Finally, after validating the collected data, and 
proceeding with obtaining the predictive model through the Matlab R2021a software, it was concluded that 
the average metrics for all its classes were optimal, presenting an accuracy of 92.77%, sensitivity of 86.62% 
and specificity of 94.7%; with an overall accuracy of 85.5%. In turn, the highest level of AUC is 0.98, thus 
being considered an optimal predictive model. 

Regarding what was obtained in [1] it is highlighted that, from the perspective of innovation, the 
results of these studies achieve great changes, they also allow delegating functions, promoting skills, and 
encouraging continuous updating; all this from the visionary leadership approach. Likewise, what was 
obtained in [22] indicates that the proposed model accurately predicts even for the early day (that is, during 0 
and | days), it also efficiently predicts the days after the end of the course and achieves better results than 
training with legacy data. As indicated in [30] there is a strong tendency to predict student performance in 
college. For what research about predicting the behavior of students in the academic environment, it is very 
interesting, because within an organization the quality of service is essential, since user satisfaction depends 
on it. The results Ghouch et al. in [31] indicate that the integration of the K-NN algorithm in the educational 
environment allows searching for students with similar behaviors, which will offer, on the one hand, a 
learning path adapted to the student's profile, and based on the experiences of others. students with similar 
behaviors by observing and analyzing their learning footprints, and, on the other hand, overcomes the 
limitations of the K-NN algorithm in terms of computational time and memory. 


K-NN supervised learning algorithm in the predictive analysis of ... (Omar Freddy Chamorro Atalaya) 


526 m) ISSN: 2502-4752 


4. CONCLUSION 

In relation to the results obtained, an optimal predictive model is evidenced, with an accuracy of 
85.5% making use of the supervised learning algorithm K-NN, likewise, the application of the cross- 
validation procedure shows optimal results in the experimentation of the K-algorithm. NN finding all the 
model metrics (sensitivity, specificity, accuracy, and precision) in the 4 classes with relatively high values, 
said this, the results will allow to establish the grouping of engineering students who can reach a level of 
satisfaction based on the indicators called predictors, by means of which the authorities will be able to make 
timely decisions to improve the percentage of satisfied students and reduce the percentage of dissatisfied 
students in relation to the quality of the virtual administrative service, also these classification techniques 
allow to extract relevant information from interested parties, from higher quality and even less time. The 
research also provides added value, since it provides the scientific and academic community with a 
methodology to classify participating students in virtual environments, identifying the relationship between 
the predictive elements and the results of the satisfaction of the quality of the administrative service, which is 
provides in this context virtually. 
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